Draft version February 1, 2008 

Preprint typeset using LATp^X style emulateapj 



COMPLETENESS IN PHOTOMETRIC AND SPECTROSCOPIC SEARCHES FOR CLUSTERS 

Martin White 1 and C.S. Kochanek 2 

1 Departments of Physics and Astronomy, University of California, Berkeley, CA 94720 
2 Harvard-Smithsonian Center for Astrophysics, 60 Garden St., Cambridge, MA 02138 
email: mwhite@astron.berkeley.edu, ckochanek@cfa.harvard.edu 

Draft version February 1, 2008 



(N 
O 
O 
(N 

Oh- 
< 

(N 

(N 
> 

O 
m 
O 



Of 
i 1 

o 



X 



ABSTRACT 

We investigate, using simulated galaxy catalogues, the completeness of searches for massive clusters of 
galaxies in redshift surveys or imaging surveys with photometric redshift estimates, i.e. what fraction of 
clusters (M > 10 14 /i _1 M Q ) are found in such surveys. We demonstrate that the matched filter method 
provides an efficient and reliable means of identifying massive clusters even when the redshift estimates 
are crude. In true redshift surveys the method works extremely well. We demonstrate that it is possible 
to construct catalogues with high completeness, low contamination and both varying little with redshift. 

Subject headings: cosmology: theory - large-scale structure of Universe 



1. INTRODUCTION 

Clusters of galaxies are one of our most important cos- 
mological probes. As the most recent objects to form 
in the universe their number density and properties are 
exquisitely sensitive to our modeling assumptions. Their 
composition accurately reflects the mix of matter in the 
universe. They are bright and can be "easily" seen to 
large distances, allowing constraints on the crucial interval 
< z <J 1 where the universal expansion changes from de- 
celeration to acceleration. They are located close to their 
formation site. Being bright and sparse they are excel- 
lent tracers of the large-scale structure - they are highly 
biased so their clustering is easy to measure and is much 
more straightforwardly computed from theory than that 
of galaxies. 

However, constructing large samples of massive clusters 
for statistical analyses rema ins a difficult task. The origi- 
nal sa mples (e.g. Abell 



et al. |1996| ; White et a 



195S- Dalton et al. 1992; Lumsden 



1999) were selected on the basis 



of projected galaxy overdensity, but it was quickly real- 
ized that such surveys suffer from projection effects and 
the large scatter between optical richness and cluster mass 
(for recen t theo retical studies see e.g. van Haar lem, Frenk 
& White |1997| ; Reblinsky & Bartelmann |1999| ). For this 
reason attention has broadened to include searches in com- 
plete redsh ift su rveys (e.g. 

& Huchra |1983| ; Ramella et al. 1994j; RamcTIa 



Geller 
Pisani 



& Geller [19971 ), sur veys a t X-ray waveleng ths (G ioia et 



al. [I990t E dge et al. |1990t H enry & Arna ud |1991| ; Rosati 



et aL |199 5| Jones et aL ]1998| Ebe ling et alJ1998fVikhlinin 
et al. |1998| ; Romer et al. J2000| ; Henry [200q ; Blanchard 
et al. |2000| ; Scharf et al. 200C), a nd us ing the Sunyaev- 
Zel'dovich effect (Carlstrom et al. 2000). More recently 
there has been significant progress in optical surveys how- 
ever, both in terms of data quality and algorithmic sophis- 
tication. The introduction of accurate, multi-color pho- 
tometry has allowed estimation of "photometric redshifts" 
which can mitigate many of the problems of foreground- 
background contamination and carefully applied filters can 
find cluster signals with even low numbers of cluster galax- 
ies. 

In this work we report preliminary investigations into 
how well a deep, multi-color optical survey would find the 



most massive clusters of galaxies. We envision this as a 
first step in a programme which would then obtain multi- 
wavelength information about a sample so selected in order 
to constrain the evolution of the mass function. We con- 
trast this with the yield expected from a shallower redshift 
survey such as could be done with the Hectospec instru- 
ment on the MMT (see Geller |1994|). 



2. CLUSTERS AND DARK ENERGY 

A recent motivation for revisiting this question, and for 
investigating strategies which can allow us to construct a 
large, well characterized sample of the rarest clusters over 
the widest area possible, is the ability of clusters to shed 
light on the nature of the dark energy believed to be caus- 
ing the accelerated expansion of the universe. The nature 
of this dark energy is one of the most vexing problems in 
cosmology and one with strong implications for our under- 
standing of fundamental physics. 

Since the dark energy is predicted to be smooth, ex- 
cept possibly near the horizon scale, all of its cosmological 
effects come in through its effect on the expansion rate 
H{z). Specifically it alters the distance-redshift relations, 
cosmological volumes and the growth of perturbations, all 
of which are integrals of the inverse Hubble parameter over 
redshift. In order to best constrain the dark energy it is de- 
sirable to probe the crucial redshift range z ~ — 1, where 
it begins to noticeably affect the expansion rate, with as 
much resolution in redshift as possible. Several authors 



(most recently Haiman, Mohr & Holder 2001) have sug- 
gested using the counts of clusters of galaxies to probe the 
evolution of the dark energy in this redshift range. 

The strongest cosmological constraints come from rel- 
atively massive clusters, which are intrinsically rare. In 
order to construct a large sample of massive clusters at 
lower redshifts [z <J 1; where detailed followup observa- 
tions are conceivable), we need to cover a large area of 
sky. This is difficult to do with existing facilities for X-ray 
or Sunyaev-Zel'dovich (SZ) observations. Such a sample, 
selected optically, would provide a much needed comple- 
ment to the higher redshift clusters found by SZ surveys 
over smaller areas of the sky. Once plausible cluster candi- 
dates have been found, multi-wavelength followup is pos- 
sible (and necessary) to help pin down the 'local' sample 
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and the normalization of the scaling relations which can 
convert observables into cluster mass. 

3. SIMULATED OBSERVATIONS 

A realistic search for clusters requires a good match to 
the spatial distribution of galaxies and to their mean den- 
sity, rather than a thorough understanding of galaxy for- 
mation. We use high resolution N-body simulations for 
the evolution of the dark matter, described in §3.1, to pro- 
vide the large scale structure and clustering of the matter 
distribution. We find that N-body based models are signif- 
icantly better than Poisson models in describing the fluc- 
tuations in the galaxy background which are important in 
cluster finding. Next, we popu late d the dark matter halos 
with galaxies as described in §3.2. Finally, we produce a 

described in §|3.3|. Some 



simulated observational catalog as 
of the limitations of our procedure are discussed in 

3.1. N-body simulation 

On large scales (Mpc and above) the distribution of 
galaxies will trace that of the dark matter, so we can 
use N-body simulations to provide a model for the large 
scale structure and the initial formation of gravitationally 
bound halos. We have run a 256 3 particle simulation of a 
ACDM model in a 2 00fr- 1 Mpc box using the TreePM-SPH 
code (White et al. 200 ip operating in collisionless (dark 
matter only) mode! This simulation represents a large 
cosmological volume, to include a fair sample of rich clus- 
ters, while maintaining enoug h m ass resolution to iden- 
tify galactic mass halos (see §|3~2]). Because it provides 
a reasonable fit to a wide range of observations, includ- 
ing the present day abunda nce o f rich clusters of galaxies 
(Pierpaoli, Scott & White 2001), we have simulat ed th e 
"concordance cosmology" of Ostriker & Steinhardt (1995), 
which has fl m = 0.3, ft A = 0.7, H = 100 h kms _1 Mpc _1 
with h = 0.67, fl B = 0.04, n = 1 and <j 8 = 0.9 (cor- 
responding to Sh = 5.02 x 10 -5 ). The simulation was 
started at z — 50 and evolved to the present with the 
full phase space distribution dumped every 100ft, _1 Mpc 
from z ~ 2 to z — 0. The gravitational for ce so ftening 
was of a spline form (e.g. Hernquist & Katz 1989] ), with 
a "Plummer-equivalent" softening length of 28 h~ L kpc co- 
moving. The particle mass is 4 x 10 10 h~ 1 MQ allowing us 
to find bound halos with masses several times lO 11 /! -1 ./^/© 
and giving many, many particles in a cluster mass halo 
(> h Mq) to begin to resolve substructure. 

We identify the real clusters in the sample using the 3D 
dark matter distribution and the friends-of-friends (FoF) 
algorithm. For each cluster we calculate directly from the 
3D distribution the mass (we use -M200, the mass enclosed 
within a radius, r2oo, within which the mean density is 200 
times the critical density at that redshift), velocity dis- 
persion etc. so we can understand our selection in terms 
of the intrinsic, rather than projected, cluster properties. 
We define the center of a cluster as the position of the po- 
tential minimum, calculating the potential using only the 
particles in the FoF group. This proved to be more robust 
than using the center of mass, as the potential minimum 
coincided closely with the density maximum for all but the 
most disturbed clusters. We show the mass function in the 
box at various redshifts in Fig. |l|. 



3.2. Adding galaxies 

We added galaxies to the simulation using a variant of 
the "halo model" for large-scale structure wherein grav- 
itational clustering is described in terms of dark matter 
halos which form a biased tracer of the large-scale density 
field. Galaxies are distributed in halos following the dark 
matter profile with an occupation number which charac- 
terizes the efficiency of galaxy formation. This method 
produces galaxy distributions which are in agreement with 
those produced by se mi-ana lytic models of galaxy forma- 
tion (Kauffma n et a l. 1999| ; Somerville & Primack 1999 



Benson et al. [2000 ) and high resolution hydrodynamic 



simulations including s tar fo rmation and fee dback (Katz 
Hernquist & Weinberg 199£; Gardner et a l. [20011 ; Pearce 
et al. 1999| ; White, Hernquist & Springel 2001) and can 
match the observe d low- order clusterin g stat istics of galax- 
ies (e.g. Jing et al. |199S| ; Benson et al. [2000| ; Sel.ja k [2000a ; 



Peacock & Smith E00C 



Scoccimarro et al. 2001: Scocci- 



marro & Sheth |200l| ) 

Our methodology is somewhat simpler than the full 
semi-analytic treatments described above, more close ly ap- 
proximating that of Kauffmann, Nusser & Steinmetz (1997) 
For every output of the simulation we produce a halo cata- 
logue by running a "friends-of-friends" (FoF) group finder 
with a linking length b — 0.2. This procedure partitions 
particles into equivalence classes by linking together all 
particles separated by less than distance b. We keep all 
groups above 8 particles, which imposes a minimum halo 
mass of 3 x 10 n /i _1 Mq. A slightly smaller minimum mass 
would be preferable, but with fixed dynamic range would 
come at the expense of less volume. For simplicity we take 
the halo "mass" to be the sum of the particles masses in 
the FoF group. We populate each halo with an integer 
number, N, of "galaxies". Each halo is a host to galax- 
ies of two types. The first, or central galaxy, is placed at 
the center of mass and inherits the center of mass veloc- 
ity. Any additional galaxies are assumed to be satellites 
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Fig. 1. — The 3D mass function of halos in our simulation box at 
2 = 0, 0.49 and 0.99. Masses are M20O) the mass enclosed within a 
radius, r200, within which the mean density is 200 times the critical 
density at that redshift. Error bars indicate purely Poisson errors. 
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and are laid down tracing the distribution of mass in the 
halo, including asymmetry and sub-structure, and inherit 
the velocity of the nearest dark matter particle. This spa- 
tial behavior is as seen in a recent hydrodynamic mode l 
of galaxy formation (White, Hernquist & Springel [2001 ) 
and is assumed in the halo model. By having the galaxies 
trace the 3D density structure in the halo rather than an 
azimuthally aver aged radial profile (such as the Navarro, 
Frenk & White 1996 profile) we avoid producing artifi- 
cially "spherical" clusters. For ease of later identification, 
we tag each galaxy with the mass of its parent halo and 
mark "central" galaxies as such. 

The number of galaxies in each halo is drawn from a dis- 
tribution whose moments we take from the semi -analy tic 
models of galaxy format ion o f Kauffman et al. ( 1999 ) as 
fit by She th & Diaferio (2001). Following Scoccimarro et 
al. (2001) we model the distribution of N as a binomial. 
For simplicity we use the same N(M) at all redshifts. 

Unfortunately, a simple implementation of these algo- 
rithms poorly reproduce real observations because they 
under predict the observed numbers of galaxies by ap- 
proximately a factor of three. The missing galaxies arise 
because the available models for the halo multiplicity func- 
tion are calibrated to match particular flux-limited sam- 
ples (e.g. the APM survey) rather than providing general 
expressions as a function of galaxy luminosity. 

Since our ability to characterize the search for clusters 
depends critically on the actual numbers of galaxies as 
well as their spatial distribution, we adjusted the models 
to better match the observed density of galaxies. The ba- 
sic problem is that the number of galaxies should vary 
with the minimum luminosity as T[l + a, L/L*\ where 
a is the faint-end slope of the luminosity function, mod- 
eled as a Schechter function (see Eq. ||) with characteris- 
tic luminosity L*. The standard halo multiplicity expres- 
sions were normalized at a luminosity limit L/L* ~ 1/2 
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Fig. 2. — The mean number of galaxies in a halo of mass M. The 

an at 



Has hed li ne shows the fit to the semi-analytic model of Kauffn 
al. (1999), the dotted line the estimate of Peacock & Smith (2000j) 
which is zero below 10 118 /i _1 Mq . The solid line is the func 
form used in this work, 
an 8 particle halo. 



and our model surveys need to include galaxies down to 
L/L* ~ 1/10 or even lower to correctly account for the 
observed number of galaxies. We achieve this by steepen- 
ing the high-mass slope of the multiplicity function and 
including galaxies corresponding to lower mass halos. In 
clusters we roughly double the number of galaxies in a 
1O 15 /i -1 M0 cluster, and with a comparable increase in 
the number of galaxies in low mass halos we preserve the 
contrast between the clusters and the background. Be- 
cause of the limited dynamic range in the simulations we 
cannot directly probe smaller mass halos, so we consid- 
ered as 'galaxies' a fraction of the un-grouped particles 
in the simulation chosen so as to have about as many un- 
grouped galaxies as grouped galaxies. The ungrouped par- 
ticles have similar clustering properties to the lowest mass 
halos. 

With these modifications our galaxy sample maintains 
the properties of the spatial distribution needed for a real- 
istic model while raising the comoving density of galaxies 
closer to the observed density. For example, the galaxy 
sample has an approximately power-law correlation func- 
tion and power spectrum on small scales, over the range 
0.5/i _1 Mpc < r < 10/i _1 Mpc the galaxy correlation func- 
tion is well fit by £(r) = (r/ro)~ 7 with r$ = 5/i~ x Mpc 



and 7 = 1.8 and a. 



gal 



0.9. With - 450, 000 galaxies in 



the 200/i _1 Mpc box at z ~ 0, the total comoving density 
of galax ies is close to that implied by the LCRS (Lin et 
al. 1996) luminosity function. 



lonal 

The vertical dotted line marks the mass of 



3.3. Simulating a field 

We simulate an observed field by "stacking" different 
slices through the box at earlier and earlier output times. 
We divide every output up into 6 "halves" (top, bottom, 
left, right, front, back) of 200 x 200 x 100/i _1 Mpc. A given 
observational field is then simulated by dividing the line- 
of-sight up into 100/i _1 Mpc pieces stepping back from the 
observer. For each piece we choose one half of the box at 
the appropriate redshift, shifted perpendicular to the line- 
of-sight by a random amount using the periodicity of the 
simulation volume. A fraction of the galaxies in that half 
of the box are projected onto the sky at the appropriate 
location with the appropriate redshift, including the pecu- 
liar velocity of the galaxy. We have chosen 100/i _1 Mpc as 
our sampling interval because it is large enough that edge 
effects are minimal even for rich clusters while being fine 
enough that line-of-sight integrals are well approximated 
by sums over the (static) outputs. However, even though 
only a small fraction of clusters lie within r2oo of a slice 
boundary, we decided to require that the orientation and 
offset change only on every second slice. Thus if we choose 
at one redshift the front of the box the next slice is required 
to the back. In this manner a cluster on the boundary is 
almost always included, though the periodicity of the box 
is artificial. 

In addition to these fields we also generated "Poisson" 
fields in which the galaxy positions in each simulation box 
were randomized before being placed into the map. These 
fields were used to estimate likelihood thresholds for the 
cluster finding described in §[![ In practice, we found that 
the Poisson fields had such low likelihoods for clusters com- 
pared to the real data that they were of little use. 

Although we assign galaxies to halos based on the mass 
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Fig. 3. — Wedge diagrams of parts of 3 of our fields. In each case 
we plot redshift against projected separation (comoving) in right as- 
cension in a wedge 1.5° thick. The surveys, offset for clarity, are 
our model MMT survey (left) subtending 6° in RA; our model SDSS 
survey (middle) subtending 3° in RA; and our model LSST survey 
(right) subtending 1.5°. Each dot represents a 'galaxy' and all galax- 
ies are plotted. 



of their parent halo, we assign luminosities to the galaxies 
randomly based on a model luminosity function. The lu- 
minosity function enters our calculation only by defining 
the distance-dependent probability that a galaxy is suffi- 
ciently luminous to be included in the final catalog. We 
assume a luminosity function <p{L, z) where the redshift 
dependence enters only through the evolution of a charac- 
teristic luminosity L*(z) following a conservative Zf = 2 
burst evolution model. If at redshift z we can detect galax- 
ies brighter than L(z) given our model flux limit, then the 
probability of including a galaxy at redshift z is 



p{z) = 



L(z) 



4>{L, z)dL 



<t>(L,z = 0)dL 



For a Schechter luminosity function, 

<j>{L) = (n*/i*)(i/i*) a exp(-i/i*), 



the function becomes 



p(z) = 



+ a, L(z)/L m {z)] 



rri 



*,o] 



(i) 



(2) 



(3) 



where we will base our luminosity fu nctio n on the LCRS ri- 
band luminosity function (Lin et al. 199£) with a — —0.70. 
The resulting number of galaxies is very sensitive to the 
treatment of the low mass, low luminosity galaxies. Be- 
cause our simulations do not treat low mass halos well, we 
modified the LCRS luminosity function so as to produce 
surveys with galaxy surface densities closer to those ob- 
served. For luminosities L cut < 0.1L* we truncated the 
luminosity function as <j>{L) = (L/L cut )(f> L CRs(L cu t)- In 
a survey to R= 20 mag, this modification increases the 
surface density of galaxies from 1100 per square degree to 
1700 per square degree with no significant changes in the 
redshift distribution. Figure |J shows the selection function 
p(z) for a range of limiting magnitudes. 

For simplicity we do not attempt to assign luminosities 
or colors to the galaxies, but characterize them only by 
the Gaussian uncertainty in their redshifts. For typical 
luminosity functions, the flux of a galaxy is sufficient to 
determine the redshift with an uncertainty of a z — z/2. 
This sets an upper bound on the redshift uncertainties for 
nearby galaxies. For photometric redshifts we will explore 
<r z = 0.05 and 0.10. For spectroscopic redshifts we used a 
very conservative uncertainty of a z = 0.01 to smooth the 
distribution on scales somewhat larger than the velocity 
dispersions of rich clusters. 

We considered three survey models motivated by the on- 
going or proposed photometric and redshift surveys. The 
first example is a complete redshift survey to R= 20 mag 
as might be conducted with the Hectospec fiber instru- 
ment on the 6.5m MMT. This sample would be ten times 
deeper (in flux) than the current generation of redshift sur- 
veys (LCRS, 2dF and SDSS). The second example is mo- 
tivated by the SDSS survey. It consists of a complete red- 
shift survey to R= 17.5 mag, a sparse, red galaxy-biased 
redshift survey to R= 20 mag, and a phot omet ric sur- 
vey to R= 22 mag. The Kauffman et al. ( |1999[ ) model 
provides separate halo mass-dependent estimates for the 
number of red and blue galaxies. All galaxies brighter than 
R= 17.5 mag and 4% of the red galaxies (1% of all galax- 
ies) between R= 17.5 mag and R= 20 mag are assigned 
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spectroscopic redshifts while the remainder are assigned 
photometric redshifts. The remaining galaxies between 
R= 17.5 mag and R= 20 mag and the galaxies between 
R= 20 mag and R= 22 mag are assigned photometric red- 
shifts. The final example is a deep photometric survey to 
R= 24 mag as might be done with the LSST. We assume 
the survey is conducted in an SDSS region and includes 
the SDSS spectroscopic redshifts. The properties of the 
model surveys are summarized in Table |l| 

3.4. Limitations 

The primary limitation in interpreting our results is that 
our model surveys consistently contain too few galaxies. 
For limiting magnitudes of R c = 20, 22 and 24 mag we 
have 1700, 8700 and 32000 galaxies per square degree com- 
pared to observed counts of 2400, 14000 and 81000 galax- 
ies per square de gree based on the Gunn-r counts from 
McLeod & Rieke ( |l995| ) and a color of R c = r-0.35. These 
undercounts are present despite our modifications to the 
halo multiplicity function and the luminosity function. In 
the absence of numerical resolution effects, simply scaling 
up N(M) would not affect the clustering of our galaxies, 
but may not be the most physically realistic solution since 
it implies relatively low mass halos would be hosts to sev- 
eral galaxies. While it is plausible that the simulations 
undercount the halos which will be low luminosity galax- 
ies or that the luminosity function genuinely turns down 
at low luminosity, it would not be physically realistic to 
raise our break luminosity above L cu t — L^/10. 

However, if our model galaxy distribution adequately 
reproduces the statistics of real galaxy distributions, as 
seems to be the case, the primary consequence of the lower 
number of galaxies is to add Poissonian noise to our search. 
The Poisson noise level will be 20%, 27% and 60% higher 
than in a real survey to R c = 20, 22 and 24 mag which is 
not a severe increase. Since real samples should have more 
galaxies, our results should be conservative. 

A secondary limitation of our modeling is that we have 
treated the effects of evolution in the luminosity function 
very simply, using a passive evolution model in which stars 
form in a single burst at Zf = 2. Particularly for the LSST 
field, where the median redshift is z — 0.6, such a model 
is too simplistic. A more realistic model would require the 
identification and treatment of individual galaxy types. 
The inclusion of galaxy types whose evolution is faster 
than a passive model would help reduce the discrepancy 
in the number of galaxies. However, given the problems 
with the halo multiplicity function and the form of the 
luminosity function, we felt that adding a more detailed 
treatment of evolution should await a better underlying 
simulation. 

4. FINDING CLUSTERS 

Our objective is to automatically produce catalogs of 
cluster candidates from the synthetic fields which we can 
then check using our knowledge of the true mass distri- 
butions. We do this using the matched filter method de- 
scribed in §4.1, adding some comments on h ow i t can be 
adapted to real data or further improved in § 4.2 . In § fO| 
we discuss the diagnostics we use to compare the output 
cluster catalog to the true clusters. 



4.1. The Matched Filter Algorithm 

We searched for clusters using an automated version of 
the Ada ptive Mesh Filter (AMF) algorithm (Kepner et 
al. 1999] ), which is itself based on the "matched filter" 



algorithm of Postman et al. ( 1996| ) 



We model the density of galaxies as a redshift-dependent 
background Pb(z) and a distribution of k = 1 • • -n c clus- 
ters. Cluster k is described by its angular position 9 k , 
(proper) scale length, r ck , redshift Zk and galaxy num- 
ber N k . At the corresponding angular diameter distance 
DA(zk) the density of galaxies associated with the cluster 



i\T fe E[(0 - 6 k )D A (z k )/r ck ]S(z - z k ) 



where we use a projected profile 

E(x) oc 



1 



(1 



(4) 



(5) 



normalized by j^" T,(x)d 2 8 = 1 for the angular distri- 
bution and (assuming redshift errors large compared to 
cluster velocity dispersions) a delta function for the red- 
shift distribution. The simple, analytic profile defined 
by Eq. (|[) provides a goo d mat ch to a projected NFW 



1996) profile. We fixed the 



(Navarro, Frenk & White 
halo concentration to c = 4 and the break radius to 
r c = 200/i~ 1 kpc, as typic al pa rameters for cluster-mass 
halos (e.g. Bullock et al. 2001 ). For a galaxy i located 
at 9i and redshift Zi with assumed Gaussian uncertainties 
characterized by Oi the expected density for cluster k is 

Pk {6 u Zi ) = iV fc £ [$ - e k )D A {z k )/r ck 

x exp [-{Zi - z k ) 2 /2af] /V^cr, (6) 
and the predicted density for galaxy i becomes 



k =\ 



(7) 



where the background model must also be modified to in- 
clude the effects of the redshift uncertainties. 




0.2 0.4 



Fig. 4. — The "selection" functions adopted for surveys to the 
limiting R magnitudes listed. The lines give the probability p(z) 
that a galaxy at a given redshift is included in the final survey. 
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The Gaussian redshift uncertainties model any informa- 
tion used to estimate the redshift of the galaxies in the sur- 
vey. At its crudest this estimate comes only from the flux 
(magnitude) of the galaxy, and at its best it comes from 
direct spectroscopic redshifts. We are interested in the 
intermediate case where we possess photometric redshift 
estimates, presumably derived from galaxy colors, with 
accuracies in a range from 0.05 £ a z <J 0.1. 

The likelihood of the model over an area A encompassing 
all the clusters and galaxies i = 1 • • • n g is 



/n c 
dz p b (z)~J2 N k 
1 1 



k=l 



(8) 



which is derived from the Poisson statistics of galaxies 
distributed over infinitesimal bins in redshift and angle 



(this is termed the "fine" likelihood by Kepner et al. 1999 
as compared to a "coarse" likelihood based on Gaussian 
statistics). 

We build the model iteratively starting from a smooth 
background (n c = 0). At each step we use the galaxy posi- 
tions and redshifts as trial cluster centers, optimizing the 
likelihood with respect to the next cluster richness Nk with 
k = n c but holding the properties of the background and 
all previous clusters fixed. We add the trial cluster pro- 
ducing the largest increase in the likelihood to the global 
model and then search for the next cluster, continuing the 
process until the likelihood gain drops below a threshold. 

O ur ap proach differs from that described by Kepner et 
al. (199E) in several respects. First, we make no use of 
the "coarse" (Gaussian) statistical model. After careful 
arrangement of the calculation and construction of linked 
lists, our execution time was not dominated by the opti- 
mization of the Poissonian likelihood with respect to Nk. 
Second, our density model explicitly includes the distri- 
bution and structure of previously found clusters. This 
automates the algorithm and provides a reasonable ap- 
proach to separating overlapping clusters and minimiz- 
ing multiple discoveries of the same cluster. In essence, 
we have combined the AMF algorithm for finding clusters 
with the Clean algorithm of radio astronomy for produc- 
ing maps. Third, rather than simply clipping the redshift 
catalog to bracket the redshift of a trial cluster, we explic- 
itly include the error-convolved redshift distribution of the 
cluster galaxies as part of the density model. 

4.2. Performance, Tuning & Refinements 



Name N 6e i d 
MMTS 9 
SDSS 9 
LSST 4 



Size 
6.0° x 6.0° 
3.0° x 3.0° 
1.5° x 1.5° 



1700 
8700 
32600 



0.23/0.30/0.36 
0.38/0.48/0.57 
0.59/0.77/0.95 



Table 1 

Characteristics of the simulated fields. Number simulated, 
size of field, number density of galaxies (per square degree) 
and the redshifts encompassing 50%, 75% and 90% of the 
survey galaxies. 



For theoretical convenience we defined our algorithm 
purely in terms of redshift uncertainties, although it would 
be trivial to redefine it in terms of luminosities and colors. 
A combination of the luminosity function and spectropho- 
tomctric models would provide predictions rrv^^z) for the 
measured magnitudes mj of galaxy i in filters j = 1 • • • n$ 
as a function of redshift and our Gaussian redshift error 
is replaced by the fit statistic between the model and the 
data. This could include a range of galaxy types and differ- 
ences in galaxy properties between the field and clusters. 

We ran our experiments with a fixed cluster scale r c = 
200/i~ 1 kpc and concentration c = 4, although in theoreti- 
cal models clusters have a range of scales, 100— 500/i _1 kpc, 
and concentrations, c ~ 4 — 8. We experimented with 
varying the scale radius and found that that the algo- 
rithm was biased towards allowing r c to become unrea- 
sonably large for some, but not all, cluster candidates. Al- 
though we did not conduct further experiments, the prob- 
lem could be solved by adding a prior probability term 
for either the scale radius or the cluster mass to bias the 
solutions against finding overly large or massive clusters. 
The most natural prior is a simple model for the cluster 
mass function such as a P(N) cx 1/N 2 power-law. We 
also found that if we increased the outer fit radius too 
much (i.e. larger c at fixed r c ) the algorithm systemati- 
cally merged neighboring clusters. With c = 4 this rarely 
happened, although occasionally a rich, nearby cluster was 
split into more than one 'candidate'. 

We constructed our background model field-by-field based 
on a coarsely binned redshift distribution of the galaxies 
with their assigned redshifts (i.e. including the redshift er- 
ror and scatter). The continuous distribution was then 
obtained by linear interpolation between the bins, which 
proved sufficient for our purposes and more stable than 
spline interpolation. 

The initial distribution of likelihoods ln£ is relatively 
well modeled as a log-normal distribution with a tail to 
higher likelihoods. To set the termination point of our al- 
gorithm we fit the initial likelihood distribution to deter- 
mine the mean and dispersion after rejecting likelihoods 
more than two standard deviations from the mean. We 
empirically set the stopping point at a likelihood threshold 
corresponding to the mean plus 1.3 standard deviations, 
which would mean that 90% of the galaxies were below 
the threshold for a Gaussian distribution. 

Finally, we optimized only the properties of the new 
cluster in estimating the likelihood. The performance of 
the algorithm might be enhanced by simultaneously opti- 
mizing the richnesses of any overlapping clusters. 

4.3. Diagnostics 

After deriving the output cluster catalog we match it to 
both the input cluster catalog and the galaxy catalog. For 
each galaxy we have the probability pi, that the galaxy is 
in the field and the probabilities pk that it is in any of 
the k — 1 • • • n c cluster candidates. We assigned galax- 
ies to clusters by first finding the most probable cluster 
for the galaxy (the index k with the maximum pk for the 
galaxy) and then assigning it to the cluster if pk > p b . 
For comparison to the fitted cluster number Nk = Nfa, we 
also counted the number of galaxies, N&, above a range 
of contrast thresholds where (pk > Ap&). Our basic as- 
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signment procedure used a contrast A = 1 and will have 
more background contamination than a higher threshold. 
We find that Ni tracks Nat closely, but with more scatter. 
These estimates for the number of member galaxies can be 
compared to the true number, AT true , of galaxies assigned 
to cluster. 

To match the output cluster catalogue to the input cata- 
logue we used position and redshift information and in ad- 
dition the modal parent halo mass of the galaxies assigned 
to the output cluster. This results in a unique match ex- 
cept in cases where a nearby rich cluster is broken into 
several candidates by the group finder, in which case that 
cluster can be flagged more than once. The matching is 
done in two directions, the best match from the input cat- 
alog to each cluster in the output list and the best match 
from the output catalog to each cluster in the input list. 
It is the latter, with duplicates trimmed, that we use to 
estimate completeness. 

It occasionally happens that two clusters overlap on the 
sky and lie within 2a of each other in the redshift direc- 
tion. In these cases our algorithm often misses one of the 
clusters, assigning its galaxies to the overlapping cluster. 
For the MMT survey this occurred slightly more than once 
per field, for a total of 12 missed clusters in the 9 fields. 
As the redshift error increases this number also increases, 
quadrupling for a z = 0.05. 

5. RESULTS 

We illustrate our results by examining our ability to pro- 
duce a catalog of clusters with masses above 2 x 10 14 M Q , 
as these are the clusters which provide the greatest con- 
straints for cosmology. Our assumption is that the catalog 
is an intermediate step, with further optical, X-ray or SZ 
observations being used to clean and calibrate the sam- 
ple. Thus we discuss only the identification of clusters 
and their members rather than the derivation of physical 
parameters. The selection of the catalog will represent 
a trade-off between completeness and contamination, with 
the contamination arising either from real, but lower mass, 
clusters or complete artifacts. We use our knowledge of the 
true cluster properties to design selection procedures for 
attaining this goal (see Appendix). 

Fig. |^ shows the distribution of matched clusters in like- 
lihood and redshift for the MMT redshift survey, mark- 
ing the ones above our 2 x 10 14 Af Q mass threshold. As 
expected, higher mass and lower redshift lead to higher 
likelihoods, but there is no sharp boundary between high 
and low mass clusters. However, there is clearly a redshift- 
dependent likelihood threshold which would keep the com- 
pleteness (fraction of M > 2 x 10 14 M Q clusters found) 
high, the contamination (fraction of M < 2 x 10 14 M Q 
clusters or false detections) low, and both roughly inde- 
pendent of redshift. If we simplify the likelihood calcula- 
tion (Eq. ^|) by assuming a top hat cluster density profile, 
then we can show that the leading terms in the likelihood 
depend only on the number of galaxies in the cluster, with 
Aln£ cx Atruo to lowest order .R As shown in Fig. o, the 
likelihood scales in this manner for the data as well. 

Thus, although there is considerable scatter due to dif- 
ferences in the structure of the cluster, the distribution of 

For very large numbers of galaxies the scaling becomes Aln£ cx 
AW In Af t rue ■ 



galaxies and the cluster environment, we adopt a likeli- 
hood threshold designed to track the number of galaxies 
expected in a cluster of fixed mass. For a Schechter lumi- 
nosity function of slope a and an evolving characteristic 
luminosity L»(z), this corresponds to a likelihood thresh- 
old which decreases as T[l + a, L(z) / L*{z)} with redshift, 
where L(z) = 47r£> 1 2 um F is the luminosity corresponding to 
the survey flux limit. For further experiments we set our 
survey thresholds using our knowledge of the true masses. 
In a real survey the thresholds would have to be cali- 
brated using clusters of known mass. Fig. || illustrates 
the redshift-dependent likelihood cuts Aln£ cut (z) for a 
range of local normalizations A In £ cut (z = 0). 

We used the most common parent halo mass of the 
galaxies identified with a cluster candidate to identify the 
input halo corresponding to the candidate. This procedure 
led to multiple identifications of the most probable, low 
redshift, massive clusters where we would find lower like- 
lihood satellite clusters most of whose galaxies are mem- 
bers of the more massive cluster. This is at least in part 
due to our fixed filter profile whose r c = 0.2ft, -1 kpc is 
somewhat smaller than the break radius of the most mas- 
sive clusters. We have automatically filtered these out by 
dropping cluster candidates with the same modal mass as 
a more likely cluster and within a projected separation of 
lh^ 1 Mpc and a redshift difference of Az = 0.05. For the 
MMTS model survey these represented 5% of the cluster 
candidates found. In a real survey, where we would lack 
the knowledge of the parent masses, these false candidates 
would be initially identified as lower mass clusters in the 
halo of a massive cluster and then eliminated by more 
careful modeling of the structure of the most massive can- 
didates. 
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Fig. 6. — Likelihoods for M > 2 X 10 Mq clusters as a function of 
their true galaxy number Ntrue- The line is a linear fit A In C oc Ntrue 
for the systems with A/true > 10. 
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We also find real high mass clusters with anomalously 
low likelihoods. Many of these are edge effects, where the 
cluster is within 0.5/i _1 Mpc of the field edge. We made 
no modifications to our algorithm to adjust the likelihoods 
for the field edges. We also find a very small number of 
overlapping high mass clusters in which one cluster ab- 
sorbs galaxies from the other leading to overly high likeli- 
hood cluster and one overly low likelihood cluster. While 
the low likelihood cluster may drop below our selection 
thresholds, because it overlaps with a cluster above the 
threshold, later studies with more accurate redshifts will 
correct the confusion. Errors in finding the clusters in the 
original N-body simulation can also be interpreted as com- 
pleteness and contamination problems - the FoF algorithm 
can combine merging clusters into a single more massive 
cluster which our search algorithm then rediscovers as a 
pair of merging clusters of lower likelihood than expected 
given the FoF mass estimate. This effect is somewhat ex- 
acerbated by our use of the canonical but relatively large 
linking length 6 = 0.2. 

The next step in defining a sample is to select A In £(0), 
the zero redshift normalization of the likelihood selection 
function. Fig. |^ illustrates how the completeness and false 
positive fraction depend on the likelihood threshold for 
each of our model surveys. We include all cluster candi- 
dates inside the redshift encompassing 90% of the survey 
galaxies (see Table [j]). The equivalent curves for lower 
redshift thresholds will have lower false costive rates for 
the same completeness because the number of false posi- 
tives rises with redshift. We define the completeness as the 
fraction of clusters inside this redshift limit with masses 
above M 20 o > 2 x 10 u ft- 1 M Q which are found in the 
cluster catalog with likelihoods above the threshold. As 
we raise the likelihood threshold, the completeness de- 
clines. False positives are candidate clusters with likeli- 
hoods exceeding the threshold which do not correspond to 
aM 2 oo > 2xl0 14 /i~ 1 M Q cluster. We distinguish two types 
of false positives - candidates identified with real but less 
massive groups, and candidates we could not identify with 
any group (a "false" group). Higher likelihood thresholds 
reduce the numbers of false positives. 

Complete redshift surveys, as illustrated by the MMT 
survey in Fig. fj], easily produce very complete cluster sam- 
ples to redshifts well past the survey median.^ While 
there are few false groups, the overall false positive frac- 
tion is significant and it is probably impossible to elimi- 
nate this problem. When we combine a noisy mass esti- 
mator (see below) with the very steep cluster mass func- 
tion (see Fig. [I), many apparently massive clusters will 
be lower mass clusters with overestimated masses (a form 
of Malmquist bias). Most of the false positives in the 
MMT survey are real groups or clusters in the mass range 
3 x IO^/T^Mq < M 200 <, lO 14 ^ 1 ^ (see Fig. 0). We 
discuss this mathematically in an Appendix. The com- 
pleteness of the survey at low redshift is underestimated 
by our basic matching software. Of the 31 clusters missed 
for low likelihood thresholds, 3 (7) are within 1 (4) ar- 
cmin of a field edge and 4 have virial radii overlapping 

2 Bear in mind that our model for a redshift survey is very conser- 
vative, since a real redshift survey has velocity measurement errors 
under 100 km/s and even rich clusters have velocity dispersions not 
much larger than 1000 km/s, while a 1% redshift error at z = 0.1 
corresponds to 3000 km/s. 



that of another massive cluster with a redshift difference 
smaller than 2a z . As we change the mass threshold, we 
can maintain high completeness out to the redshift where 
the typical cluster at the mass threshold contains three 
galaxies, an effect which is well described by the Poisson 
model for the survey described in the Appendix. 

As we switch from spectroscopic redshifts information 
to photometric redshifts the completeness achievable for a 
given false positive rate declines, as illustrated in Fig. by 
the SDSS and LSST survey models. For runs with larger 
errors than shown here we even have difficulty performing 
the match between the input and output catalogs using the 
most common parent halo mass of the galaxies identified 
with a given cluster candidate. 

Next we selected a likelihood cutoff where we estimate 
that the survey would be 80% complete and determined 
the completeness and false positive fractions of the result- 
ing catalogs as a function of redshift (Fig. ||). Because of 
the design of the likelihood cut function, the completeness 
is nearly constant out to the redshift encompassing 90% of 
the survey galaxies. The false positive fraction generally 
rises with redshift, and catalogs extending to lower red- 
shifts can have significantly lower contamination for the 
same level of completeness. The apparent drop in the 
completeness of the MMT model survey at low redshift 
is due in part to edge effects but also to the fact that 
we have required an extremely high likelihood threshold 
to remove low mass systems. The false positive rate rises 
faster with redshift in the SDSS survey because complete 
redshift information is available only for the nearby galax- 
ies (z < 0.2). At these low redshifts, the addition of the 
deeper photometric catalog to the redshift data appears 
to improve the performance significantly. 

The redshift dependence of the completeness is well de- 
fined by the Poisson model for the survey developed in 
the Appendix. For a likelihood threshold roughly corre- 
sponding to the number of galaxies in a cluster at the 
threshold mass, the survey will be nearly complete up to 
the redshift where the average number of galaxies in the 
threshold mass cluster drops below about 3 galaxies. The 
Poisson model works less well for explaining the fraction of 
false positives. Adding small number of background galax- 
ies to each cluster does explain the rapid rise in the false 
positive fraction with redshift. The model underpredicts 
the false positive fraction at lower redshifts, probably be- 
cause the contamination from the background distribution 
of galaxies is poorly described by a simple Poisson model. 

Finally, we explore the correlation of our model cluster 
parameters with the true properties of the cluster. Fig. |^ 
compares estimates for the number of cluster galaxies with 
the true number. The number of galaxies estimated in the 
likelihood, N c , closely matches the true number in rich 
clusters. But, in a catalog selected based on the cluster 
likelihood, we tend to find clusters with small galaxy ex- 
cesses compared to the real cluster. Roughly speaking, the 
fit parameter N c usually finds 5-10 more galaxies than 
were actually in the cluster. We can also estimate the 
number of galaxies by counting the number of galaxies 
whose probability of being a cluster member exceeds their 
probability of being in the background by a factor of A. 
The number at unit contrast, N±, tracks Na t closely with 
some additional scatter. Higher contrast values provide 
better discrimination against background contamination. 
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Fig. 7. — The survey completeness and false positive rates. The top, middle and bottom rows illustrate the properties of the MMTS, SDSS and 
LSST model surveys as a function of the zero redshift likelihood cut lnC(z = 0). The left column shows the completeness defined by the fraction 
of M200 > 2 X 10 14 h~ 1 Mq clusters found in the survey out the redshift encompassing 90% of the survey galaxies. The middle column shows 
the false positive fraction defined by the fraction of cluster candidates above the likelihood threshold which are not M200 > 2 X 10 14 /i — 1 Mq 
clusters. The right column shows the false group fraction defined by the fraction of cluster candidates above the likelihood threshold which 
are not identified with any input cluster. For the MMT survey the dashed line in the false positive column shows the fraction of candidates 
which correspond to slightly less massive clusters with 3 X 10 lS /i _1 Af Q < M 2 oo < 2 X 10 14 /i _1 Af Q . For the SDSS and LSST surveys the line 
patterns show the assumed photometric redshift errors of c z = 0.05 (solid) and 0.10 (dashed). 
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Fig. 8. — The survey completeness and false positive rates. The top, middle and bottom rows illustrate the properties of the MMTS, SDSS 
and LSST model surveys as a function of redshift. The columns show the effects of increasing errors in the redshift estimates. These are fixed 
to a z = 0.01 for the MMT survey and are a z = 0.05 (left) and 0.10 (right) for the SDSS and LSST model surveys. The solid histograms show 
the completeness, the fraction of M200 > 2 X 10 14 /i -1 Mq clusters found by the survey, and the dashed histogram show the fraction of false 
positives in the survey. This includes both real, but less massive clusters and false groups, but is generally dominated by real clusters with 
3 X 10 13 h~ 1 Mq < M200 < 2 X I0 14 h~ 1 Mq. The solid (dashed) curves show the Poisson model for the completeness (false positive fraction) 
derived in the Appendix. The vertical dashed lines mark the redshift encompassing 50%, 75% and 90% of the survey galaxies. 
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Fig. H also compares N2 and N4 with N tIUC - At an interme- 
diate contrast A = 2, N 2 is less biased systems with small 
numbers of galaxies but begins to underestimate the num- 
bers of galaxies in systems with large numbers of galaxies. 
These trends become clearer for the higher contrast level of 
A = 4. Inspection of individual systems with extreme ra- 
tios iVfit /-A/"t rU e provides no guidance towards an improved 
estimator. They tend to be relatively massive systems 
with modest numbers of galaxies where the finger-of-god 
from the velocity dispersion overlaps a larger than average 
number of galaxies. 

Finally we can use the probability that any galaxy is 
a cluster member to estimate the cluster redshift. The 
estimates scale as expected, as shown in Fig. O. In general 
it is possible to compute any cluster property[e.g. velocity 
dispersion) using such a probability weighting. We shall 
defer discussion of such estimators to a future publication. 

6. DISCUSSION 

Our tests of the matched filter method for finding clus- 
ters in optical surveys with either photometric or spec- 
troscopic redshifts show that it is an efficient and reli- 
able means of identifying massive clusters even when the 
redshift estimates are crude. In redshift surveys, where 
cluster surveys have usually used FoF methods rather 
than matched filters, the method works extremely well. 
By selecting clusters using a redshift-dependent likelihood 
threshold roughly tracking the expected number of galax- 
ies from a cluster of fixed mass, we can construct catalogs 
with high completeness, low contamination and both vary- 
ing little with redshift. The method automatically assigns 
a probability that each galaxy is a member of any cluster, 
which can be used in the estimate cluster properties such 
as redshift or velocity dispersion. 

Both the completeness and the contamination in our 




Fig. 10. — Estimated versus true cluster redshifts. 



mock surveys can be understood in part using a simple 
analytic model (described in the Appendix). The largest 
effect is the well known scatter in optical richness at a fixed 
cluster mass. Due to the steeply falling mass function of 
clusters this scatter implies that any sample selected on the 
basis of a fixed number of galaxies will be contaminated by 
abnormally rich, low-mass clusters. In our mock surveys 
this is by far the largest effect, with "false" clusters being 
almost entirely absent. We find that the false positive rate 
increases with redshift, so that our samples will have less 
contamination if restricted to a lower redshift cutoff than 
the 90th percentile we have assumed throughout. In any 
case, the likelihood threshold can be adjusted to modify 
cither the desired completeness or contamination. 

The addition of photometric data for fainter galaxies to 
a redshift survey, as in the SDSS at low redshift, consid- 
erably improves the detection of clusters over the redshift 
data alone. The redshift catalog, by pinning down the 
foreground contamination, probably improves the detec- 
tion of higher redshift clusters which are detectable only 
in the photometric data. 

We thus expect that the matched filter method can in 
future be used to construct large samples of clusters to 
"modest" redshifts, though follow-up observations will be 
necessary to clean the sample. Apart from clusters lying 
near the edge of our fields, the most common misidenti- 
fication was to split very large clusters into a core and 
satellite population. This occurred due to our assumption 
of fixed core radius and concentration. The second most 
common misidentification was when two massive clusters 
overlapped, with one 'stealing' galaxies from another and 
causing it to drop below our likelihood threshold. In both 
cases any confusion would be immediately eliminated by 
more careful follow-up observations and modeling of the 
cluster region. 

Our investigation is but a first step, and further work 
is needed. The primary limitation of our mock surveys 
is the inadequacy of current methods for populating dark 
matter halos with galaxies. Our simulations have too few 
galaxies compared to real surveys (by factors 30%, 40% 
and 60% for R< 20, 22 and 24 mag). While simulations 
with a higher dynamic range would be a step in the right 
direction, significant uncertainties remain in the modeling 
of N(M) and how it depends on luminosity and redshift. 
Encouragingly our results should be conservative in this 
respect. It is also necessary to apply this method to real 
data, to uncover any failure modes which have been missed 
by the simulations. 
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APPENDIX 

POISSON THEORY OF COMPLETENESS 

The likelihood of finding a cluster is largely controlled by 
the number of member galaxies. This allows us to make a 
model for the tradeoff between completeness and contam- 
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Fig. 9. — Density contours for various estimates of the number of galaxies in the cluster versus the true number of galaxies Ntmc- From 
bottom to top we compare the true number of galaxies to the estimated number from the likelihood function jVfi t = N c , the number of galaxies 
JVa=4, JVa=2 and ATa=i with a likelihood contrast relative to the background larger than a factor of A = 4, 2 and 1 respectively, and the 
number of real cluster galaxies with a likelihood contrast relative to background larger than unity iV mot j e . The contours are spaced by factors 
of 2 in the density. The smooth curves show lines where N = Nt TU c, N = Nt Tue + 5 and N = JVtruc + 10. 
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ination. If the expected number of galaxies in a cluster 
of mass M and redshift z is (N) = N (M)p(z) (see §3.3) 
and the halo mass function is dn/dM then the number of 
clusters with N ^ s galaxies is 



dn 



dN, 



obs 



= dV dM 



dn (N(M,z)) 



N,, 



dM 



N, 



obs ■ 



■exp(-(N(M, z))) 
.(AH 

for volume element dV and assuming Poisson statistics, 
expected for the high mass end of the mass function. If 
we search for clusters of mass M > Mq, the total number 
within redshift z is 



N tot (>M ) = / dV 
Jo 



°° JJr dn 
dM 

M 



dM' ^ 



The cluster likelihood roughly scales as Aln£ cx N so 
we will find clusters above a fixed mass threshold if we 
scale the likelihood or the number of members with the 
expectation value. The threshold is set by the number of 
galaxies at z = 0, no, and then decreases with redshift as 
n(z) = nop(z). The cluster sample will contain 

r z r°° dn 

AM> M o, > n ) = dV dM—P [n(z), (N(M, z))] 



J Mi 



dM 



galaxies above the mass threshold, where 



P[n,N) 



r[l + n,JV] 
T[l + n] 



(A3) 



(A4) 



is the fraction of clusters expected to have N galaxies con- 
taining at least n galaxies. The completeness of a sample 
selected with this criterion is N{ n d/Nt Q t. Since the likeli- 
hood depends only on the number of galaxies, we also find 
false positives from lower mass clusters with galaxy mem- 
bership above the threshold. The number of false positives 



is 



A r faise(< M ,> n ) = / dV 

Jo Jo 



" l0 dM^P[n(z),{N(M,z))\, 



(A5) 

and the fraction of cluster candidates where are false pos- 
itives is iVfcJse/ (N [nd + iVfalse). 

We can extend this basic theory to a more realistic 
model for a cluster survey with two modifications. First, 
a cluster must contain a minimum number of galaxies, 
N thresh — 3, to be detected. This lower bound corre- 
sponds to the likelihood threshold of the catalog, below 
which the candidates are dominated by true false pos- 
itives with no correspondence to any cluster. We im- 
plement it in the Poisson model by using a threshold 
n(z) = max(nop(z), Nthresh)- Second, the cluster cata- 
logs are also contaminated by unrelated galaxies. These 
chance projections alter the apparent number of galaxies 
associated with a cluster. Based on Fig. || we model these 
chance projections as a Poisson process with an expecta- 
tion value of iVj, ~ 5. For a cluster expected to have N c 
galaxies and a detection threshold of i, the probability of 
the cluster including chance projections having at least n 
galaxies at least j of which are cluster members is 



We are taking the sum i over the possible level of back- 
ground contamination, weighted by its Poisson likelihood 
given the value of iV&, multiplied by the probability that 
the cluster will contain enough galaxies for the sum of the 
number in the cluster and in the background to reach the 
threshold. 

We illustrate the behavior of this model in Fig. |[ We 
fixed Nthresh = 3 and we adjusted n(0) to produce a low 
redshift completeness slightly above the average observed 
completeness. We scaled it to be slightly above because we 
lose some clusters due to effects not in the model (edges, 
overlapping clusters) . We fixed the amount of background 
contamination to N b = 5, 10 and 15 for the MMT SDSS 
and LSST model surveys based very crudely on the offset 
between Nfn and N trU e in Fig. ^|. The Poisson model de- 
scribes the completeness of the survey well, matching the 
extended region of nearly constant completeness followed 
by a sharp drop produced by the requirement for a finite 
number of galaxies N t hresh in a cluster. The model de- 
scribes the false positive fraction less well. The rapid rise 
in the false positive fraction near the drop in the complete- 
ness is due to the Poisson fluctuations in the contamina- 
tion. However, the overall distribution of false positives 
cannot be explained by the Poisson model. 

The limitation of the Poisson model is implicit in the 
wide range of likelihoods found for a fixed true number of 
galaxies (see Fig. ||). While the likelihood roughly scales 
with the true number of galaxies in the cluster, there is 
significant scatter about the general trend. Clusters differ 
not only in their total galaxy content, but also in their 
internal structure (break radius, concentration), the sam- 
pling of the internal structure, and the density of their 
local environment. Any effects which increase the scat- 
ter between the likelihood and N trU e will produce more 
false positives for a fixed level of completeness. The back- 
ground contamination is also more complicated than a the 
simple Poisson model, since the background galaxies are 
themselves clustered. For example if the average back- 
ground contamination is Nb = 4 but galaxies are always 
clustered in pairs, the likelihood of 6 contaminating galax- 
ies is nearly doubled. 
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